Threshold Selection in Feature Screening for Error Rate Control

نویسندگان

چکیده

Hard thresholding rule is commonly adopted in feature screening procedures to screen out unimportant predictors for ultrahigh-dimensional data. However, different thresholds are required adapt contexts of problems and an appropriate magnitude usually varies from the model error distribution. With ad-hoc choice, it unclear whether all important selected or not, very likely that would include many features. We introduce a data-adaptive threshold selection procedure with rate control, which applicable most kinds popular methods. The key idea apply sample-splitting strategy construct series statistics marginal symmetry property then utilize obtaining approximation number false discoveries. show proposed method able asymptotically control discovery per family under certain conditions still retains predictors. Three examples presented illustrate merits new procedures. Numerical experiments indicate methodology works well existing Supplementary materials this article available online.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

H-BwoaSvm: A Hybrid Model for Classification and Feature Selection of Mammography Screening Behavior Data

Breast cancer is one of the most common cancer in the world. Early detection of cancers cause significantly reduce in morbidity rate and treatment costs. Mammography is a known effective diagnosis method of breast cancer. A way for mammography screening behavior identification is women's awareness evaluation for participating in mammography screening programs. Todays, intelligence systems could...

متن کامل

Minimum Bayes error feature selection

We consider the problem of designing a linear transformation 2 IR , of rank p n, which projects the features of a classi er x 2 IR onto y = x 2 IR such as to achieve minimum Bayes error (or probability of misclassi cation). Two avenues will be explored: the rst is to maximize the -average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the r...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

Proposed Feature Selection for Dynamic Thermal Management in Multicore Systems

Increasing the number of cores in order to the demand of more computing power has led to increasing the processor temperature of a multi-core system. One of the main approaches for reducing temperature is the dynamic thermal management techniques. These methods divided into two classes, reactive and proactive. Proactive methods manage the processor temperature, by forecasting the temperature be...

متن کامل

Impact of error estimation on feature selection

Given a large set of potential features, it is usually necessary to find a small subset with which to classify. The task of finding an optimal feature set is inherently combinatoric and therefore suboptimal algorithms are typically used to find feature sets. If feature selection is based directly on classification error, then a feature-selection algorithm must base its decision on error estimat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of the American Statistical Association

سال: 2022

ISSN: ['0162-1459', '1537-274X', '2326-6228', '1522-5445']

DOI: https://doi.org/10.1080/01621459.2021.2011735